New improvements in decoding speed and latency for automatic captioning
نویسندگان
چکیده
In this paper, we present new improvements in decoding speed and latency for automatic captioning in telehealth. Complementary local word confidence scores are used to prune uncompetitive search paths. Subspace distribution clustering hidden Markov modeling (SDCHMM) is used for fast generation of acoustic and local confidence scores, where overlap accumulative probability (OAP) is used to measure the similarity of Gaussian pdf’s in SDCHMM. We propose to use pre-backtrace based on detection of prosodic boundaries defined by unfilled pauses, filled pauses, as well as pitch contour to decrease latency. Experiments were conducted on a telehealth captioning task with vocabulary sizes of 21 K and 46 K. The proposed methods led to 33% improvement in decoding speed without loss of word accuracy, and to 3 folds of decrease in maximum latency with about 1.6% loss of word accuracy.
منابع مشابه
Towards automatic closed captioning : low latency real time broadcast news transcription
In this paper, we present a low latency real-time Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy. We describe our recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real-time. These include the discriminative training of a feature transform and the acoustic m...
متن کاملReinforced Video Captioning with Entailment Rewards
Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic m...
متن کاملMulti-Task Video Captioning with Video and Entailment Generation
Video captioning, the task of describing the content of a video, has seen some promising improvements in recent years with sequence-to-sequence models, but accurately learning the temporal and logical dynamics involved in the task still remains a challenge, especially given the lack of sufficient annotated data. We improve video captioning by sharing knowledge with two related directed-generati...
متن کاملA New Design for Two-input XOR Gate in Quantum-dot Cellular Automata
Quantum-dot Cellular Automata (QCA) technology is attractive due to its low power consumption, fast speed and small dimension, therefore, it is a promising alternative to CMOS technology. In QCA, configuration of charges plays the role which is played by current in CMOS. This replacement provides the significant advantages. Additionally, exclusive-or (XOR) gate is a useful building block in man...
متن کاملImage Representations and New Domains in Neural Image Captioning
We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-theart neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result i...
متن کامل